Goto

Collaborating Authors

 Cairns Region


Novel sparse matrix algorithm expands the feasible size of a self-organizing map of the knowledge indexed by a database of peer-reviewed medical literature

Amos, Andrew, Lee, Joanne, Gupta, Tarun Sen, Malau-Aduli, Bunmi S.

arXiv.org Artificial Intelligence

Past efforts to map the Medline database have been limited to small subsets of the available data because of the exponentially increasing memory and processing demands of existing algorithms. We designed a novel algorithm for sparse matrix multiplication that allowed us to apply a self-organizing map to the entire Medline dataset, allowing for a more complete map of existing medical knowledge. The algorithm also increases the feasibility of refining the self-organizing map to account for changes in the dataset over time.


Studying Effective String Theory using deep generative models

Caselle, Michele, Cellini, Elia, Nada, Alessandro

arXiv.org Artificial Intelligence

Effective String Theory (EST) offers a robust non-perturbative framework for describing confinement in Yang-Mills theory by treating the confining flux tube between a static quark-antiquark pair as a thin, vibrating string. While EST calculations are typically carried out using zeta-function regularization, certain problems-such as determining the flux tube width-are too complex to solve analytically. However, recent studies have demonstrated that EST can be explored numerically by employing deep learning techniques based on generative algorithms. In this work, we provide a brief introduction to EST and this novel numerical approach. Finally, we present results for the width of the Nambu-Gotö EST.


Semi-Supervised Federated Learning via Dual Contrastive Learning and Soft Labeling for Intelligent Fault Diagnosis

Dai, Yajiao, Li, Jun, Mei, Zhen, Ni, Yiyang, Jin, Shi, Li, Zengxiang, Guo, Sheng, Xiang, Wei

arXiv.org Artificial Intelligence

--Intelligent fault diagnosis (IFD) plays a crucial role in ensuring the safe operation of industrial machinery and improving production efficiency. However, traditional supervised deep learning methods require a large amount of training data and labels, which are often located in different clients. Additionally, the cost of data labeling is high, making labels difficult to acquire. Meanwhile, differences in data distribution among clients may also hinder the model's performance. T o tackle these challenges, this paper proposes a semi-supervised federated learning framework, SSFL-DCSL, which integrates dual contrastive loss and soft labeling to address data and label scarcity for distributed clients with few labeled samples while safeguarding user privacy. It enables representation learning using unlabeled data on the client side and facilitates joint learning among clients through prototypes, thereby achieving mutual knowledge sharing and preventing local model divergence. Specifically, first, a sample weighting function based on the Laplace distribution is designed to alleviate bias caused by low confidence in pseudo labels during the semi-supervised training process. Second, a dual contrastive loss is introduced to mitigate model divergence caused by different data distributions, comprising local contrastive loss and global contrastive loss. Third, local prototypes are aggregated on the server with weighted averaging and updated with momentum to share knowledge among clients. T o evaluate the proposed SSFL-DCSL framework, experiments are conducted on two publicly available datasets and a dataset collected on motors from the factory. In the most challenging task, where only 10% of the data are labeled, the proposed SSFL-DCSL can improve accuracy by 1.15% to 7.85% over state-of-the-art methods. Dai and Z. Mei are with the School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China (e-mail: { yajiao.dai, J. Li and S. Jin are with the School of Information Science and Engineering, Southeast University, Nanjing, 210096, China (e-mail: jun.li, jinshi@seu.edu.cn).


Multimodal Generative AI with Autoregressive LLMs for Human Motion Understanding and Generation: A Way Forward

Islam, Muhammad, Huang, Tao, Ahn, Euijoon, Naseem, Usman

arXiv.org Artificial Intelligence

This paper presents an in-depth survey on the use of multimodal Generative Artificial Intelligence (GenAI) and autoregressive Large Language Models (LLMs) for human motion understanding and generation, offering insights into emerging methods, architectures, and their potential to advance realistic and versatile motion synthesis. Focusing exclusively on text and motion modalities, this research investigates how textual descriptions can guide the generation of complex, human-like motion sequences. The paper explores various generative approaches, including autoregressive models, diffusion models, Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and transformer-based models, by analyzing their strengths and limitations in terms of motion quality, computational efficiency, and adaptability. It highlights recent advances in text-conditioned motion generation, where textual inputs are used to control and refine motion outputs with greater precision. The integration of LLMs further enhances these models by enabling semantic alignment between instructions and motion, improving coherence and contextual relevance. This systematic survey underscores the transformative potential of text-to-motion GenAI and LLM architectures in applications such as healthcare, humanoids, gaming, animation, and assistive technologies, while addressing ongoing challenges in generating efficient and realistic human motion.


Unified Large Language Models for Misinformation Detection in Low-Resource Linguistic Settings

Islam, Muhammad, Khan, Javed Ali, Abaker, Mohammed, Daud, Ali, Irshad, Azeem

arXiv.org Artificial Intelligence

--The rapid expansion of social media platforms has significantly increased the dissemination of forged content and misinformation, making the detection of fake news a critical area of research. Although fact-checking efforts predominantly focus on English-language news, there is a noticeable gap in resources and strategies to detect news in regional languages, such as Urdu. Advanced Fake News Detection (FND) techniques rely heavily on large, accurately labeled datasets. However, FND in under-resourced languages like Urdu faces substantial challenges due to the scarcity of extensive corpora and the lack of validated lexical resources. Current Urdu fake news datasets are often domain-specific and inaccessible to the public. They also lack human verification, relying mainly on unverified English-to-Urdu translations, which compromises their reliability in practical applications. This study highlights the necessity of developing reliable, expert-verified, and domain-independent Urdu-enhanced FND datasets to improve fake news detection in Urdu and other resource-constrained languages. This paper presents the first benchmark large FND dataset for Urdu news, which is publicly available for validation and deep analysis. We also evaluate this dataset using multiple state-of-the-art pre-trained large language models (LLMs), such as XLNet, mBERT, XLM-RoBERT a, RoBERT a, DistilBERT, and DeBERT a. Additionally, we propose a unified LLM model that outperforms the others with different embedding and feature extraction techniques. The performance of these models is compared based on accuracy, F1 score, precision, recall, and human judgment for vetting the sample results of news. The proposed model outperforms advanced machine learning and deep learning models previously used in the literature for fake news detection.


Computationally Efficient Diffusion Models in Medical Imaging: A Comprehensive Review

Abdullah, null, Huang, Tao, Lee, Ickjai, Ahn, Euijoon

arXiv.org Artificial Intelligence

The diffusion model has recently emerged as a potent approach in computer vision, demonstrating remarkable performances in the field of generative artificial intelligence. Capable of producing high-quality synthetic images, diffusion models have been successfully applied across a range of applications. However, a significant challenge remains with the high computational cost associated with training and generating these models. This study focuses on the efficiency and inference time of diffusion-based generative models, highlighting their applications in both natural and medical imaging. We present the most recent advances in diffusion models by categorizing them into three key models: the Denoising Diffusion Probabilistic Model (DDPM), the Latent Diffusion Model (LDM), and the Wavelet Diffusion Model (WDM). These models play a crucial role in medical imaging, where producing fast, reliable, and high-quality medical images is essential for accurate analysis of abnormalities and disease diagnosis. We first investigate the general framework of DDPM, LDM, and WDM and discuss the computational complexity gap filled by these models in natural and medical imaging. We then discuss the current limitations of these models as well as the opportunities and future research directions in medical imaging.


Evaluating ASR Confidence Scores for Automated Error Detection in User-Assisted Correction Interfaces

Kuhn, Korbinian, Kersken, Verena, Zimmermann, Gottfried

arXiv.org Artificial Intelligence

Despite advances in Automatic Speech Recognition (ASR), transcription errors persist and require manual correction. Confidence scores, which indicate the certainty of ASR results, could assist users in identifying and correcting errors. This study evaluates the reliability of confidence scores for error detection through a comprehensive analysis of end-to-end ASR models and a user study with 36 participants. The results show that while confidence scores correlate with transcription accuracy, their error detection performance is limited. Classifiers frequently miss errors or generate many false positives, undermining their practical utility. Confidence-based error detection neither improved correction efficiency nor was perceived as helpful by participants. These findings highlight the limitations of confidence scores and the need for more sophisticated approaches to improve user interaction and explainability of ASR results.


Word2Minecraft: Generating 3D Game Levels through Large Language Models

Huang, Shuo, Nasir, Muhammad Umair, James, Steven, Togelius, Julian

arXiv.org Artificial Intelligence

We present Word2Minecraft, a system that leverages large language models to generate playable game levels in Minecraft based on structured stories. The system transforms narrative elements-such as protagonist goals, antagonist challenges, and environmental settings-into game levels with both spatial and gameplay constraints. We introduce a flexible framework that allows for the customization of story complexity, enabling dynamic level generation. The system employs a scaling algorithm to maintain spatial consistency while adapting key game elements. We evaluate Word2Minecraft using both metric-based and human-based methods. Our results show that GPT-4-Turbo outperforms GPT-4o-Mini in most areas, including story coherence and objective enjoyment, while the latter excels in aesthetic appeal. We also demonstrate the system' s ability to generate levels with high map enjoyment, offering a promising step forward in the intersection of story generation and game design. We open-source the code at https://github.com/JMZ-kk/Word2Minecraft/tree/word2mc_v0


OptiPMB: Enhancing 3D Multi-Object Tracking with Optimized Poisson Multi-Bernoulli Filtering

Ding, Guanhua, Xia, Yuxuan, Guan, Runwei, Wu, Qinchen, Huang, Tao, Ding, Weiping, Sun, Jinping, Mao, Guoqiang

arXiv.org Artificial Intelligence

Accurate 3D multi-object tracking (MOT) is crucial for autonomous driving, as it enables robust perception, navigation, and planning in complex environments. While deep learning-based solutions have demonstrated impressive 3D MOT performance, model-based approaches remain appealing for their simplicity, interpretability, and data efficiency. Conventional model-based trackers typically rely on random vector-based Bayesian filters within the tracking-by-detection (TBD) framework but face limitations due to heuristic data association and track management schemes. In contrast, random finite set (RFS)-based Bayesian filtering handles object birth, survival, and death in a theoretically sound manner, facilitating interpretability and parameter tuning. In this paper, we present OptiPMB, a novel RFS-based 3D MOT method that employs an optimized Poisson multi-Bernoulli (PMB) filter while incorporating several key innovative designs within the TBD framework. Specifically, we propose a measurement-driven hybrid adaptive birth model for improved track initialization, employ adaptive detection probability parameters to effectively maintain tracks for occluded objects, and optimize density pruning and track extraction modules to further enhance overall tracking performance. Extensive evaluations on nuScenes and KITTI datasets show that OptiPMB achieves superior tracking accuracy compared with state-of-the-art methods, thereby establishing a new benchmark for model-based 3D MOT and offering valuable insights for future research on RFS-based trackers in autonomous driving.


Monolingual and Multilingual Misinformation Detection for Low-Resource Languages: A Comprehensive Survey

Wang, Xinyu, Zhang, Wenbo, Rajtmajer, Sarah

arXiv.org Artificial Intelligence

In today's global digital landscape, misinformation transcends linguistic boundaries, posing a significant challenge for moderation systems. While significant advances have been made in misinformation detection, the focus remains largely on monolingual high-resource contexts, with low-resource languages often overlooked. This survey aims to bridge that gap by providing a comprehensive overview of the current research on low-resource language misinformation detection in both monolingual and multilingual settings. We review the existing datasets, methodologies, and tools used in these domains, identifying key challenges related to: data resources, model development, cultural and linguistic context, real-world applications, and research efforts. We also examine emerging approaches, such as language-agnostic models and multi-modal techniques, while emphasizing the need for improved data collection practices, interdisciplinary collaboration, and stronger incentives for socially responsible AI research. Our findings underscore the need for robust, inclusive systems capable of addressing misinformation across diverse linguistic and cultural contexts.